contextual entity
MARRS: Multimodal Reference Resolution System
Ates, Halim Cagri, Bhargava, Shruti, Li, Site, Lu, Jiarui, Maddula, Siddhardha, Moniz, Joel Ruben Antony, Nalamalapu, Anil Kumar, Nguyen, Roman Hoang, Ozyildirim, Melis, Patel, Alkesh, Piraviperumal, Dhivya, Renkens, Vincent, Samal, Ankit, Tran, Thy, Tseng, Bo-Hsiang, Yu, Hong, Zhang, Yuan, Zou, Rong
Successfully handling context is essential for any dialog understanding task. This context maybe be conversational (relying on previous user queries or system responses), visual (relying on what the user sees, for example, on their screen), or background (based on signals such as a ringing alarm or playing music). In this work, we present an overview of MARRS, or Multimodal Reference Resolution System, an on-device framework within a Natural Language Understanding system, responsible for handling conversational, visual and background context. In particular, we present different machine learning models to enable handing contextual queries; specifically, one to enable reference resolution, and one to handle context via query rewriting. We also describe how these models complement each other to form a unified, coherent, lightweight system that can understand context while preserving user privacy.
- Oceania > Australia (0.04)
- North America > United States > Ohio (0.04)
- Europe > France (0.04)
- (3 more...)
- Overview (0.55)
- Research Report (0.50)
Robust Acoustic and Semantic Contextual Biasing in Neural Transducers for Speech Recognition
Fu, Xuandi, Sathyendra, Kanthashree Mysore, Gandhe, Ankur, Liu, Jing, Strimel, Grant P., McGowan, Ross, Mouchtaris, Athanasios
Attention-based contextual biasing approaches have shown significant improvements in the recognition of generic and/or personal rare-words in End-to-End Automatic Speech Recognition (E2E ASR) systems like neural transducers. These approaches employ cross-attention to bias the model towards specific contextual entities injected as bias-phrases to the model. Prior approaches typically relied on subword encoders for encoding the bias phrases. However, subword tokenizations are coarse and fail to capture granular pronunciation information which is crucial for biasing based on acoustic similarity. In this work, we propose to use lightweight character representations to encode fine-grained pronunciation features to improve contextual biasing guided by acoustic similarity between the audio and the contextual entities (termed acoustic biasing). We further integrate pretrained neural language model (NLM) based encoders to encode the utterance's semantic context along with contextual entities to perform biasing informed by the utterance's semantic context (termed semantic biasing). Experiments using a Conformer Transducer model on the Librispeech dataset show a 4.62% - 9.26% relative WER improvement on different biasing list sizes over the baseline contextual model when incorporating our proposed acoustic and semantic biasing approach. On a large-scale in-house dataset, we observe 7.91% relative WER improvement compared to our baseline model. On tail utterances, the improvements are even more pronounced with 36.80% and 23.40% relative WER improvements on Librispeech rare words and an in-house testset respectively.
Currently There Are Four Distinct Chatbot Dialog Development Approaches
When different vendors and platforms converge on the same basic approach & principles, it is safe to assume it is the most efficient way of doing it. There are elements like form or slot filling, policies etc. which is also important. But for the purpose of this story, we will not focus on it. Intents are usually defined with by a description and a few training or utterance examples. The utterance examples are what a user is anticipated to say or state.
DKN: Deep Knowledge-Aware Network for News Recommendation
Wang, Hongwei, Zhang, Fuzheng, Xie, Xing, Guo, Minyi
Online news recommender systems aim to address the information explosion of news and make personalized recommendation for users. In general, news language is highly condensed, full of knowledge entities and common sense. However, existing methods are unaware of such external knowledge and cannot fully discover latent knowledge-level connections among news. The recommended results for a user are consequently limited to simple patterns and cannot be extended reasonably. Moreover, news recommendation also faces the challenges of high time-sensitivity of news and dynamic diversity of users' interests. To solve the above problems, in this paper, we propose a deep knowledge-aware network (DKN) that incorporates knowledge graph representation into news recommendation. DKN is a content-based deep recommendation framework for click-through rate prediction. The key component of DKN is a multi-channel and word-entity-aligned knowledge-aware convolutional neural network (KCNN) that fuses semantic-level and knowledge-level representations of news. KCNN treats words and entities as multiple channels, and explicitly keeps their alignment relationship during convolution. In addition, to address users' diverse interests, we also design an attention module in DKN to dynamically aggregate a user's history with respect to current candidate news. Through extensive experiments on a real online news platform, we demonstrate that DKN achieves substantial gains over state-of-the-art deep recommendation models. We also validate the efficacy of the usage of knowledge in DKN.
- Asia > North Korea (0.28)
- Asia > Middle East > Iran (0.14)
- North America > United States > Nevada > Clark County > Las Vegas (0.05)
- (10 more...)
- Transportation > Ground > Road (0.93)
- Media > News (0.86)
- Government > Regional Government (0.69)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)